4 research outputs found
Large databases of real and synthetic images for feature evaluation and prediction
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 157-167).Image features are widely used in computer vision applications from stereo matching to panorama stitching to object and scene recognition. They exploit image regularities to capture structure in images both locally, using a patch around an interest point, and globally, over the entire image. Image features need to be distinctive and robust toward variations in scene content, camera viewpoint and illumination conditions. Common tasks are matching local features across images and finding semantically meaningful matches amongst a large set of images. If there is enough structure or regularity in the images, we should be able not only to find good matches but also to predict parts of the objects or the scene that were not directly captured by the camera. One of the difficulties in evaluating the performance of image features in both the prediction and matching tasks is the availability of ground truth data. In this dissertation, we take two different approaches. First, we propose using a photorealistic virtual world for evaluating local feature descriptors and leaning new feature detectors. Acquiring ground truth data and, in particular pixel to pixel correspondences between images, in complex 3D scenes under different viewpoint and illumination conditions in a controlled way is nearly impossible in a real world setting. Instead, we use a high-resolution 3D model of a city to gain complete and repeatable control of the environment. We calibrate our virtual world evaluations by comparing against feature rankings made from photographic data of the same subject matter (the Statue of Liberty). We then use our virtual world to study the effects on descriptor performance of controlled changes in viewpoint and illumination. We further employ machine learning techniques to train a model that would recognize visually rich interest points and optimize the performance of a given descriptor. In the latter part of the thesis, we take advantage of the large amounts of image data available on the Internet to explore the regularities in outdoor scenes and, more specifically, the matching and prediction tasks in street level images. Generally, people are very adept at predicting what they might encounter as they navigate through the world. They use all of their prior experience to make such predictions even when placed in unfamiliar environment. We propose a system that can predict what lies just beyond the boundaries of the image using a large photo collection of images of the same class, but not from the same location in the real world. We evaluate the performance of the system using different global or quantized densely extracted local features. We demonstrate how to build seamless transitions between the query and prediction images, thus creating a photorealistic virtual space from real world images.by Biliana K. Kaneva.Ph.D
Evaluation of image features using a photorealistic virtual world
Image features are widely used in computer vision applications. They need to be robust to scene changes and image transformations. Designing and comparing feature descriptors requires the ability to evaluate their performance with respect to those transformations. We want to know how robust the descriptors are to changes in the lighting, scene, or viewing conditions. For this, we need ground truth data of different scenes viewed under different camera or lighting conditions in a controlled way. Such data is very difficult to gather in a real-world setting. We propose using a photorealistic virtual world to gain complete and repeatable control of the environment in order to evaluate image features. We calibrate our virtual world evaluations by comparing against feature rankings made from photographic data of the same subject matter (the Statue of Liberty). We find very similar feature rankings between the two datasets. We then use our virtual world to study the effects on descriptor performance of controlled changes in viewpoint and illumination. We also study the effect of augmenting the descriptors with depth information to improve performance.Quanta Computer (Firm)Shell ResearchUnited States. Office of Naval Research. Multidisciplinary University Research Initiative (Grant N00014-06-1-0734)United States. Office of Naval Research. Multidisciplinary University Research Initiative. CAREER (Award Number 0747120)United States. Office of Naval Research. Multidisciplinary University Research Initiative (Grant N000141010933)Microsoft CorporationAdobe SystemsGoogle (Firm
Matching and Predicting Street Level Images
The paradigm of matching images to a very large dataset
has been used for numerous vision tasks and is a powerful one. If the
image dataset is large enough, one can expect to nd good matches of
almost any image to the database, allowing label transfer [3, 15], and
image editing or enhancement [6, 11]. Users of this approach will want
to know how many images are required, and what features to use for
nding semantic relevant matches. Furthermore, for navigation tasks or
to exploit context, users will want to know the predictive quality of the
dataset: can we predict the image that would be seen under changes in
camera position?
We address these questions in detail for one category of images: street
level views. We have a dataset of images taken from an enumeration of
positions and viewpoints within Pittsburgh.We evaluate how well we can
match those images, using images from non-Pittsburgh cities, and how
well we can predict the images that would be seen under changes in cam-
era position. We compare performance for these tasks for eight di erent
feature sets, nding a feature set that outperforms the others (HOG).
A combination of all the features performs better in the prediction task
than any individual feature. We used Amazon Mechanical Turk workers
to rank the matches and predictions of di erent algorithm conditions by
comparing each one to the selection of a random image. This approach
can evaluate the e cacy of di erent feature sets and parameter settings
for the matching paradigm with other image categories.United States. Dept. of Defense (ARDA VACE)United States. National Geospatial-Intelligence Agency (NEGI-1582-04- 0004)United States. National Geospatial-Intelligence Agency (MURI Grant N00014-06-1-0734)France. Agence nationale de la recherche (project HFIBMR (ANR-07-BLAN- 0331-01))Institut national de recherche en informatique et en automatique (France)Xerox Fellowship Progra
Infinite Images: Creating and Exploring a Large Photorealistic Virtual Space
We present a system for generating āinfiniteā images from large collections of photos by means of transformed image retrieval. Given a query image, we first transform it to simulate how it would look if the camera moved sideways and then perform image retrieval based on the transformed image. We then blend the query and retrieved images to create a larger panorama. Repeating this process will produce an āinfiniteā image. The transformed image retrieval model is not limited to simple 2-D left/right image translation, however, and we show how to approximate other camera motions like rotation and forward motion/zoom-in using simple 2-D image transforms. We represent images in the database as a graph where each node is an image and different types of edges correspond to different types of geometric transformations simulating different camera motions. Generating infinite images is thus reduced to following paths in the image graph. Given this data structure we can also generate a panorama that connects two query images, simply by finding the shortest path between the two in the image graph. We call this option the āimage taxi.ā Our approach does not assume photographs are of a single real 3-D location, nor that they were taken at the same time. Instead, we organize the photos in themes, such as city streets or skylines and synthesize new virtual scenes by combining images from distinct but visually similar locations. There are a number of potential applications to this technology. It can be used to generate long panoramas as well as content aware transitions between reference images or video shots. Finally, the image graph allows users to interactively explore large photo collections for ideation, games, social interaction, and artistic purposes.United States. Army Research Office. Multidisciplinary University Research Initiative (Grant Number N00014-06-1- 0734)French National Research Agency (ANR) (project HFIBMR) (ANR-07-BLAN-0331-01)United States. National Geospatial-Intelligence Agency (NEGI-1582-04-0004